feat(xlang): support serialization for unsigned types and field encoding config #3113

chaokunyang · 2026-01-07T06:11:19Z

Why?

Java doesn't have native unsigned integer types, but many other languages (Rust, Go, C++, Python with ctypes) do. When serializing data across languages, we need to properly handle unsigned integers to ensure correct values and efficient encoding.

For example:

A Rust u32 with value 3_000_000_000 cannot be directly represented in Java's signed int (max ~2.1 billion)
Variable-length encoding for unsigned integers can skip zigzag encoding overhead
Cross-language compatibility requires proper unsigned type support in the protocol

What does this PR do?

1. Adds Unsigned Integer Type Support (All Languages)

New type IDs: UINT8 (9), UINT16 (10), UINT32 (11), VAR_UINT32 (12), UINT64 (13), VAR_UINT64 (14), TAGGED_UINT64 (15)
Unsigned types use the same bit width as signed types but interpret values in the unsigned range

2. Renames Type Constants for Clarity (All Languages)

VAR32 → VARINT32
VAR64 → VARINT64
H64 → TAGGED_INT64
VARU32 → VAR_UINT32
VARU64 → VAR_UINT64
HU64 → TAGGED_UINT64

3. Java: Adds Type Annotations for Field-Level Control

New annotations allow specifying exact encoding at field level:

@Uint8Type - Mark field as unsigned 8-bit [0, 255]
@Uint16Type - Mark field as unsigned 16-bit [0, 65535]
@Uint32Type(compress=true/false) - Unsigned 32-bit with optional varint encoding
@Uint64Type(encoding=FIXED/VARINT/TAGGED) - Unsigned 64-bit with encoding options
@Int32Type(compress=true/false) - Signed 32-bit with optional varint encoding
@Int64Type(encoding=FIXED/VARINT/TAGGED) - Signed 64-bit with encoding options

4. C++: Adds `FORY_FIELD_CONFIG` Macro for Field Encoding Control

struct MyStruct {
  uint32_t fixed_count;
  uint64_t var_id;
  uint64_t tagged_ts;
  FORY_FIELDS_INFO(MyStruct, fixed_count, var_id, tagged_ts);
};

// Configure encoding per field
FORY_FIELD_CONFIG(MyStruct,
  (fixed_count, Encoding::Fixed),    // Use fixed 4-byte UINT32
  (var_id, Encoding::Varint),        // Use VAR_UINT64
  (tagged_ts, Encoding::Tagged)      // Use TAGGED_UINT64
);

5. Rust: Extends `#[fory(...)]` Derive Macro with Encoding Attributes

#[derive(Fory)]
struct MyStruct {
    #[fory(compress = false)]           // Use fixed INT32 instead of VARINT32
    fixed_id: i32,
    
    #[fory(encoding = "tagged")]        // Use TAGGED_INT64
    tagged_ts: i64,
    
    #[fory(encoding = "varint")]        // Use VAR_UINT64 (default for u64)
    var_count: u64,
    
    #[fory(encoding = "fixed")]         // Use fixed UINT64
    fixed_count: u64,
}

6. Go: Extends Struct Tags with `compress` and `encoding` Options

type MyStruct struct {
    FixedI32   int32   `fory:"compress=false"`        // Use fixed INT32
    VarI32     int32   `fory:"encoding=varint"`       // Use VARINT32 (default)
    FixedU32   uint32  `fory:"encoding=fixed"`        // Use fixed UINT32
    TaggedI64  int64   `fory:"encoding=tagged"`       // Use TAGGED_INT64
    VarU64     uint64  `fory:"encoding=varint"`       // Use VAR_UINT64 (default)
    FixedU64   uint64  `fory:"encoding=fixed"`        // Use fixed UINT64
}

Options:

compress=true/false: For int32/uint32, controls varint vs fixed encoding
encoding=varint/fixed/tagged: For all numeric types, explicitly sets encoding
- int32/uint32: "varint" (default) or "fixed"
- int64/uint64: "varint" (default), "fixed", or "tagged"

7. Python: Adds Type Hints for Encoding Control

from pyfory.types import (
    int32, fixed_int32,           # VARINT32 vs INT32
    int64, fixed_int64, tagged_int64,  # VARINT64 vs INT64 vs TAGGED_INT64
    uint32, fixed_uint32,         # VAR_UINT32 vs UINT32
    uint64, fixed_uint64, tagged_uint64,  # VAR_UINT64 vs UINT64 vs TAGGED_UINT64
)

@dataclass
class MyStruct:
    var_id: int32            # Uses VARINT32
    fixed_id: fixed_int32    # Uses fixed INT32
    tagged_ts: tagged_int64  # Uses TAGGED_INT64
    var_count: uint64        # Uses VAR_UINT64
    fixed_count: fixed_uint64  # Uses fixed UINT64

8. Java Internal Changes

DispatchId System: New DispatchId class handles type dispatching in code generation
ObjectCodecBuilder: Handles boxed dispatch IDs for non-nullable boxed fields
Type ID Unification: Java native mode now shares type IDs (BOOL~STRING) with xlang mode

Related issues

Closes #3110
Closes #2914
#3099
#1017
#2906
#2982

Does this PR introduce any user-facing change?

Does this PR introduce any public API change?
- Java: New annotations @Uint8Type, @Uint16Type, @Uint32Type, @Uint64Type, @Int32Type, @Int64Type
- C++: New FORY_FIELD_CONFIG macro for encoding configuration
- Rust: New compress and encoding attributes in #[fory(...)] derive macro
- Go: New compress and encoding options in struct tags
- Python: New type hints (fixed_int32, tagged_int64, uint32, etc.)
- All: Renamed type constants (e.g., VAR32 → VARINT32)
Does this PR introduce any binary protocol compatibility change?
- Adds new type IDs for unsigned integers (9-15)
- Existing signed integer encoding remains compatible

Benchmark

N/A - This PR focuses on correctness and cross-language compatibility. Performance characteristics of unsigned types are similar to their signed counterparts.

…merics

…or_java

- Add missing Apache license header to DispatchId.java - Fix ClassCastException in DefaultValueUtils.setDefaultValues by using Number interface for type conversion instead of direct casts

…rackingRef is false When global ref tracking is enabled, serializers call reference() at the end of deserialization. If a field has trackingRef=false (e.g., in xlang mode where all fields default to trackingRef=false), we need to push a stub -1 via preserveRefId() so that reference() can pop it and skip setReadObject. The fix checks if the TYPE normally needs ref tracking (ignoring field-level metadata) by using TypeRef.of(typeRef.getRawType()). This ensures the stub is pushed when needed, preventing ArrayIndexOutOfBoundsException when the serializer calls reference() on an empty readRefIds stack.

Use Types.getTypeId() instead of ClassResolver registered IDs for determining dispatch IDs in DefaultValueUtils. This ensures consistent type IDs between DispatchId constants and the values used in setDefaultValues. Also convert default values to correct types during initialization to avoid repeated type conversion at runtime.

## Why? ## What does this PR do? Fix performance regression introduced in #3113 ## Related issues #3113 ## Does this PR introduce any user-facing change? - [ ] Does this PR introduce any public API change? - [ ] Does this PR introduce any binary protocol compatibility change? ## Benchmark

chaokunyang added 6 commits January 5, 2026 14:01

update spec doc

8abeba5

update buffer read/write API

0f08ce4

rename _util to buffer

d0eed13

rename _registry to registry

a9701f7

support unsigned types and refactor java type system

ea6401e

fix errors

d59b477

chaokunyang requested review from PragmaTwice and theweipeng as code owners January 7, 2026 06:11

refactor xlang numeric read/write

f95614f

chaokunyang requested review from LiangliangSui, pandalee99 and urlyy January 7, 2026 06:54

pandalee99 approved these changes Jan 7, 2026

View reviewed changes

chaokunyang added 4 commits January 7, 2026 15:49

fix rust error

68512fc

fix go unsigned support

cd34fac

fix go codegen

b870947

update c++ unsigned and compressed int support

cfc927b

chaokunyang changed the title ~~feat(java/xlang): support unsigned types for java~~ feat(java/xlang): support unsigned types for java/python/xlang Jan 7, 2026

chaokunyang added 5 commits January 7, 2026 17:13

support unsigned and configurable compress types for field

e663c1b

add javadoc to annotation

852d92c

add unsgined fields xlang tests

51688b0

revert build_linux_wheels.py

f2f5059

add unsigned java tests

43b7783

chaokunyang force-pushed the support_unsigned_types_for_java branch from 1c1ae07 to 43b7783 Compare January 7, 2026 10:10

chaokunyang added 2 commits January 7, 2026 22:18

fix descriptor sort comparator

baf70d7

fix go/java xlang struct fields serde

4d16d37

chaokunyang force-pushed the support_unsigned_types_for_java branch from 85d3b9b to af664fd Compare January 9, 2026 11:00

refactor go struct serializer

4b225ba

chaokunyang mentioned this pull request Jan 9, 2026

[Java] Serialization fails with versions later than 0.12.3 #3118

Closed

2 tasks

chaokunyang force-pushed the support_unsigned_types_for_java branch from af664fd to 4b225ba Compare January 9, 2026 16:51

chaokunyang added 14 commits January 10, 2026 14:18

add rust unsigned and compressed fields support

51e6848

update xlang tests in java side

098117b

make cpp support configure number encoding and sort fields for all nu…

1c5670b

…merics

refactor rust field meta config parse

67758cf

update go test

aaa66e8

format code

9ed0723

Merge remote-tracking branch 'asf/main' into support_unsigned_types_f…

2f25afa

…or_java

fix merge conflict

1bd9fe1

fix tests

268fd95

fix python tests

b3a9723

fix c++ tests

ad0257e

fix go tests

eb99c8c

revert DEBUG_OUTPUT_ENABLED flag

df96ffd

fix tests

b05ac57

chaokunyang changed the title ~~feat(java/xlang): support unsigned types for java/python/xlang~~ feat(xlang): support serialization for unsigned types and field encoding config Jan 10, 2026

chaokunyang added 2 commits January 10, 2026 22:10

update cpp doc for fields

c012fee

update go tag

4f7fbad

chaokunyang mentioned this pull request Jan 10, 2026

RoadMap for 1.0 #1017

Open

17 tasks

chaokunyang added 9 commits January 10, 2026 22:20

fix: add license header and fix type conversion in DefaultValueUtils

88a0194

- Add missing Apache license header to DispatchId.java - Fix ClassCastException in DefaultValueUtils.setDefaultValues by using Number interface for type conversion instead of direct casts

fix ci

cca95fb

style: format code with spotless

fe0fb76

fix build ci

47169d8

fix(ci): correct buffer.go typo to buffer.so in macos universal2 build

db73589

udpate benchmark code

585e545

fix code style

64482b5

chaokunyang merged commit 724fece into apache:main Jan 10, 2026
59 checks passed

chaokunyang mentioned this pull request Jan 11, 2026

perf(go): optimize go struct fields serialization perf #3120

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(xlang): support serialization for unsigned types and field encoding config #3113

feat(xlang): support serialization for unsigned types and field encoding config #3113

Uh oh!

chaokunyang commented Jan 7, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat(xlang): support serialization for unsigned types and field encoding config #3113

feat(xlang): support serialization for unsigned types and field encoding config #3113

Uh oh!

Conversation

chaokunyang commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why?

What does this PR do?

1. Adds Unsigned Integer Type Support (All Languages)

2. Renames Type Constants for Clarity (All Languages)

3. Java: Adds Type Annotations for Field-Level Control

4. C++: Adds FORY_FIELD_CONFIG Macro for Field Encoding Control

5. Rust: Extends #[fory(...)] Derive Macro with Encoding Attributes

6. Go: Extends Struct Tags with compress and encoding Options

7. Python: Adds Type Hints for Encoding Control

8. Java Internal Changes

Related issues

Does this PR introduce any user-facing change?

Benchmark

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chaokunyang commented Jan 7, 2026 •

edited

Loading

4. C++: Adds `FORY_FIELD_CONFIG` Macro for Field Encoding Control

5. Rust: Extends `#[fory(...)]` Derive Macro with Encoding Attributes

6. Go: Extends Struct Tags with `compress` and `encoding` Options